Now, I'd like to talk
about why testing is hard. As we talked about last time,
software testing has lots of benefits. It's the only defect detection technique
that can check the whole system including all the tools you're using:
the operating system, the processor, the compiler, etc. And currently, it's the best way to assess certain kinds
of system behavior like performance. And we just need it for customers to
be willing to accept our product. But it's hard, why do I say it's hard? Well, imagine that you are this
person here looking at the wall. So the wall represents all
the possible behaviors of the system. But all you can see is
in through these little tiny points that you have in the wall. And you have to kind of
look through there and try and
inspect what's going on inside the wall. That's kind of what software
testing is like to some degree. It's inherently hard, there's no
way that we can make it easy, and let me demonstrate that. So the problem with software testing,
as opposed to testing other things, is that it only samples
a set of possible behaviors. And this, we do all the time,
when we build bridges, when we check steel girders,
when we look at the performance of cheetahs running through the Savannah,
we do a lot of sampling. So it's not that this is a new idea, but software is fundamentally different
than most of these testing problems. Unlike physical systems,
most software systems are discontinuous. So what do I mean by that? In most engineering applications, like you're trying to check
the strength of the steel girder. What you would do is you would check
it under a load, load A, load B, load C and the loads would get higher. And eventually get to a point
where the girder would break. And if you ran this test 100 times with
different girders at different levels, you'd kind of know where
the girder is likely to break. And if you sample at
these different loads and then draw a line through them,
you'd have a pretty good idea of where the girder is likely to break
at any point along that line. But software is not like that, it's
like if you were building a bridge and you wanted to check
the strength of the bridge. You put some people on the bridge and
you see, does it hold up? And then you put a whole bunch of
dump trucks on the bridge, really, really heavy things. And you put them on there and
you say, does the bridge hold up? And it says, yes, and then suddenly,
you open up the bridge. And one Friday, some yellow Volkswagen
is driving down the bridge and the whole thing collapses. The thing that happens with software
is that it's not continuous. You can't draw those lines
through the behavior. So if we were to look at it,
we have what's called a state space, which is a big cloud of possible
values that the system is in. And when we make mistakes,
when programmers make bugs, they kind of make them uniformly
across there in different places. And when we do testing, we're doing these
little point checks of the software. And these little points don't necessarily
tell us anything about the points nearby. So if we don't exactly hit a bug,
we won't know that there's some buggy behavior very,
very close to the value that we chose. And so we have to consider all the
possible states of the system to really verify the software effectively. But even small systems have trillions and
trillions of possible states. So it's not possible for
us to check software exhaustively. What we can do is we can try and
check software effectively. But it will always be incomplete because
of this discontinuous nature of it. So let me give you an example of that,
just to make it more concrete. So this is an example from Microsoft,
thank you, Microsoft, from it's Zune media player. So if you were one of the few people that
had a Zune, In 2008 on New Year's Eve, you wouldn't have been able
to use it to play music. In fact, every single Zune in
existence turned into a brick. And the reason for that was in this
code that you see in front of you. And it seems fairly simple, and it doesn't even have anything
to do with playing music. It's some code that
runs at initialization. And what it's designed to do
is to set the current date. So the Zune recorded the date
as a number of days since 1980. And when it displays it to the user, they wanted to have it in terms of a year,
end of day. So I want you stare at this code for
a little bit. And you know that it turned the Zune into
a brick, so basically, it wouldn't run. See if you can figure out why. Okay, well, let's take a look at this. If you were able to figure it out, great. If you weren't, no problem,
we'll walk through exactly what happened. So we know that the Zune turned into
a brick on New Years Eve of 2008. So there's one clue in there,
2008 is a leap year. So we know the Zune had a problem
on the last day of a leap year. So let's look at this
code with that in mind. Now, the very last day of
the year is day number 366. So let's just imagine rather
than talking about 2008. Let's just talk about day
number 366 of any given year. So if we look at this code,
while days greater than 365, check. If it's a leap year, yep,
it's a leap year, check. If the days is greater than 366,
nope, no, it's 366 exactly. Then we do the code inside.
So we fall out of this and we look and
we don't take this else branch. So we go back to the top of the loop. Now, we run this loop again, and
we're at the same spot that we started. We're still at day 366,
nothing has changed. So we go through the code again. Days is still not greater than 366 and
we do this again and again. So what we have here is an infinite loop. What we have is a comparison
that was incorrect. So this should be greater than or
equal to 366. So in order to catch this bug,
you have to be pretty clever. You have to pick exactly the right
input to stimulate the code to exhibit this erroneous behavior. So let's look at what you have to
do in order to be a good tester. So to find this bug,
we have to have knowledge. We have to have knowledge of
the system that we're testing. We have to know how days and years work,
and what values might be problematic. So it turns out that a lot of programmers
make mistakes at boundary conditions. So if we think about years,
where the boundary conditions are, are things like the first and
the last day of the year. And in a leap year, it's a different day
than it would be in the regular year. Another place where people make
mistakes is, what happens if you have a value that's unexpected,
like a negative year or a negative day? Then, the program may do the wrong thing
and it may get up into an infinite loop. So what you have to do is you have
to think like a helpful adversary. You have to think about the code and
its behavior. And you have to wonder,
what can I do to break this thing? Maybe if I give it a really big day
number, that'll cause a problem. Or if I give it a negative day number, or if I give it a day number
on one of these boundaries. So in order to test well, we both have
to know the requirements of the system. And testing from the requirements is
something called black box testing. And also, in many cases,
we have to examine the code. So we have to know where those
boundary conditions are. And that's called white box testing, where
we test, looking at the code to try and find areas where programmers
are likely to make mistakes. So over the next several lectures and even courses, we're going to discuss test
metrics and test generation strategies that make it more likely that you're
going to find these kinds of errors. So let's look at testing from 10,000 feet. We can think about testing
in terms of scale. So what does this mean? Usually what we want to do is
we want to divide and conquer. We want to start by testing small
because when we test small things, we can test them more rigorously. So we have unit tests that we
apply to small pieces of code, usually at the class level for Java. And then, we have integration tests, where we can start checking bigger pieces
of functionality to see if a subsystem does what it's supposed to do and
how well it performs. And then finally, we have system tests
where we put everything together, and we throw tests at it to make sure that
it does what the customer expects, according to the customer
use cases usually. Though one thing to remember
is in the web world, the concept of system is a bit fluid,
especially when you're using services. So figuring out how to draw your
system boundaries can be part of the testing problem. Another dimension we'll look
at this from is process. So we can think about when
should we write the test. One prominent area of
testing that's gained more use is called
test-driven development. So you actually write the test
before you write the code. And we judge how mature the code
is by how many of the tests pass. The standard way of doing it
though is still test after. So we write the test either while
we're writing the software or after writing the software, and we see
whether the software does the right thing. But in either case,
even if you write the test first, there's always going to be iteration. So we're going to make changes to the code
and we're going to want to re-test. And the more that you re-test,
the quicker you find the errors. Another dimension to consider
this from is purpose. So how you test when you want to check
the functionality of the system is different than how you test if you
want to check the performance of it, especially when you're looking at,
say, web scale. And if you want to test for
things like security, you may look for different parts of the code than you
would if you're looking for usability. And finally, for availability,
which has become a lot more prominent and important when you look at
Enterprise level applications. Sometimes, you have just
chaotic situations. There's this tool that they use
at Netflix called Chaos Monkey, where it literally goes and
unplugs different hardware devices. Or it nukes or it turns off certain software processes
and sees how the system responds. So what we're going to do is we're
going to use these three axes of testing to come up with techniques and
mechanisms that are effective at each. And so just to recap, we're going to
try and give you lots of techniques in this analysis square so that
you can become an effective tester.